Speech Segregation Based on Pitch Tracking and Amplitude Modulation
نویسندگان
چکیده
Speech segregation is an important task of auditory scene analysis (ASA), in which the speech of a certain speaker is separated from other interfering signals. Wang and Brown proposed a multistage neural model for speech segregation, the core of which is a two-layer oscillator network. In this paper, we extend their model by adding further processes based on psychoacoustic evidence to improve the performance. These processes include pitch tracking and grouping based on amplitude modulation (AM). Our model is systematically evaluated and compared with the Wang-Brown model, and it yields significantly better performance.
منابع مشابه
A Hybrid Approach for Co-Channel Speech Segregation based on CASA, HMM Multipitch Tracking, and Medium Frame Harmonic Model
This paper proposes a hybrid approach for cochannel speech segregation. HMM (hidden Markov model) is used to track the pitches of 2 talkers. The resulting pitch tracks are then enriched with the prominent pitch. The enriched tracks are correctly grouped using pitch continuity. Medium frame harmonics are used to extract the second pitch for frames with only one pitch deduced using the previous s...
متن کاملSpeech analysis and synthesis using an AM-FM modulation model
In this paper, the AM{FM modulation model is applied to speech analysis, synthesis and coding. The multiband demodulation pitch tracking algorithm is proposed that produces smooth and accurate fundamental frequency contours. The AM{ FM modulation vocoder represents speech as the sum of resonance signals modeled by their amplitude envelope and instantaneous frequency signals. E cient modeling an...
متن کاملOn Amplitude Modulation for Monaural Speech Segregation
We propose a computational auditory scene analysis (CASA) model for monaural speech segregation. It deals with low-frequency and high-frequency signals differently. For high-frequency signals, it generates segments based on common amplitude modulation (AM) and groups them according to AM repetition rates. This model performs substantially better than previous CASA systems.
متن کاملThe role of temporal resolution in modulation-based speech segregation
This study is concerned with the challenge of automatically segregating a target speech signal from interfering background noise. A computational speech segregation system is presented which exploits logarithmically-scaled amplitude modulation spectrogram (AMS) features to distinguish between speech and noise activity on the basis of individual time-frequency (T-F) units. One important paramete...
متن کاملThe Function of Pitch Range Variations in Samples of Emotional Expressions in Persian
This study aims at investigating the interface between emotion and intonation patterns (more specifically, duration and pitch amplitude of speech). To this end, the acoustic properties of spectral parameters related to speech prosody are investigated. The results of acoustic and Statistical analysis show that mean level and range of FO in the contours vary strongly as a function of the degree o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004